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f^ f Abstract 

_j_ We develop a probabilistic framework for global modeling of the traf- 

fsi fie over a computer network. This model integrates existing single-link 

(-flow) traffic models with the routing over the network to capture the 
global traffic behavior. It arises from a limit approximation of the traffic 

>T" fluctuations as the time-scale and the number of users sharing the network 

^H grow. The resulting probability model is comprised of a Gaussian and/or 

CZ2 a stable, infinite variance components. They can be succinctly described 

, ^ , and handled by certain 'space-time' random fields. The model is vali- 

dated against simulated and real data. It is then applied to predict traffic 

t-H fluctuations over unobserved links from a limited set of observed links. 

^- Further, applications to anomaly detection and network management are 

C ■ briefly discussed. 

m 
en 
^ 1 Introduction 

f^ Understanding the statistical behavior of computer network traffic has been an 

important and challenging problem for the past 15 years, because of its impact 

on network performance and provisioning J5TJ [5JJ1 HS1 US] an d on the potential 

J> for development of more suitable protocols [551 HE]- Since the early 1990s it 

has been well established that the traffic over a single link exhibits intricate 
temporal dependence, known as burstiness, which could not be explained by 
traffic models developed for telephone networks [20] ■ This phenomenon could 
be understood and described by using the notions of long-range dependence 
and self-similarity |12j , which in turn are affected by the presence of heavy tails 
in the distribution of file sizes [55] . A bottom- up mechanistic model for single 
link network traffic that is in agreement with the empirical features observed in 
real network traces was presented in [38] . A competing model based on queuing 
ideas was studied in [25] . These works lead to many further developments (see 
eg M)- 

Advances in technology that allowed the acquisition of direct, through sam- 
pling [TUl [42] , and indirect [19] measurements have allowed researchers to ex- 
amine the characteristics of traffic in entire networks [HI [15j [311 HI] > based on 



statistical modeling analysis. On the other hand, an analogue of the mechanistic 
models available for single link network traffic is not available. Such a model 
would allow better understanding of network performance |13| 21 and detection 
of anomalous behavior |27) . Further, it would manage to capture and explain 
statistical relationships between flows traversing the network at all time scales 
(time) and across all links (space); the latter represents a fairly tall requirement, 
which may also prove rather impractical given the underlying complexity (pro- 
tocols, applications) and heterogeneity (physical infrastructure, diverse users) 
of modern networks. 

Our objective in this paper is to propose a mechanistic model that captures 
several fundamental characteristics of network-wide traffic and thus constitutes 
a partial solution for this challenging problem. The model is based on modeling 
user behavior on source-destination paths across the network and then aggregate 
over users and over time, thus developing a joint 'space-time' probability model 
for the traffic fluctuations over all links in the network. This model reflects 
the statistical dependence of the traffic across different links, observed at the 
same or different points in time. We demonstrate the success of our modeling 
strategy in the context of network traffic prediction - a problem with important 
implications on network performance, provisioning, and management. 

The remainder of the paper is structured as follows. In Section pO] we review 
briefly the existing and relatively well-understood theory of single-flow (link) 
models for the temporal dependence in network traffic. Long-range dependence 
and heavy tails play a central role. In Section|3j we postulate our network-wide 
model based on combining single-flow models through the routing equation. 
We show that the scaling limit of such a model is a combination of fractional 
Brownian motions and infinite variance stable Levy motions. A succinct repre- 
sentation of these processes is given in Section |3.2| via the functional fractional 
Brownian motion and functional Levy stable motion. The resulting model is 
then used in Section [4] to solve the network kriging problem, i.e. to predict the 
traffic fluctuations on a unobserved link from a limited set of measurements of 
observed links. In Section [5] we use extensive NetFlow data of sampled network 
traffic to obtain approximations of the flow-level traffic Xj (t) . These data are 
then used to validate our model and demonstrate the success of the network 
kriging methodology. We conclude in Section [6] with some remarks on future 
applications, statistical problems on networks, and further extensions of the 
network-wide probabilistic model. 

2 Problem Formulation 

Consider a computer network of L links and N nodes. The network typically 
carries traffic flows (via groups of packets) from any node (source) to any other 
node (destination) over a predetermined set of links (route). This can be for- 
mally described by the routing matrix A = (o£j)lxj, where 

J 1 , route j involves link £ 
3 1 , otherwise, 



and where J is the total number of routes (typically, J = N(N — 1)). 

We describe next the physical premises of our modeling framework. We 
assume, for simplicity, that the traffic is fluid. That is, the amount of data 
(bytes) transmitted over link £ during the time interval (a, b) is J Yi(t)dt, where 
Yi(t) is the traffic intensity (bytes per unit time) over link £. Let also Xj{t) 
denote the traffic intensity at time t over route j, 1 < j < J . Then, assuming 
that traffic propagates instantaneously over the network, we obtain the following 
routing equation: 

Y(t)=AX(t) 1 (1) 

where X(t) = (Xj(t))i<j<j and Y(t) = (Yt(t))i<t<ij. This relationship is 
valid only to the extent that traffic propagates instantaneously along the routes. 
Thus, (fTl) cannot be adopted over the finest, high-frequency time scales where 
packet delay plays a central role. On the other hand, for all practical purposes, 
the routing equation holds over a wide range of time scales greater than the RTT 
(round trip time) for packets in the network [311 142] . This equation captures 
the fundamental relationship between the traffic intensity over different routes 
in the network and the resulting load, incurred on the links. 

From a physical perspective, the computer network is merely used to 'trans- 
port' information from source nodes to destination nodes. In normal (uncon- 
gested) operating regime, the traffic is carried seamlessly and the traffic intensi- 
ties Xj(t) are driven solely by the demand along the routes j, 1 < j < J ' . Thus, 
as a first approximation one may view the Xj(ty% as statistically independent 
in 3, 1 < j < J ■ Therefore, in view of (fTl), the statistical dependence between 
Yi 1 (t) and Ye 2 (t) for two links t\ and £2, is governed by the set of routes Xj(t) 
that use both £\ and £2- 

In view of the above discussion, guided by the routing equation (flj, we 
obtain a global model for the traffic intensity Yi(t), 1 < £ < L, t > 0. The 
temporal dependence of the flow-level traffic Xj(t) can be described well by 
the existing mechanistic models exhibiting long-range dependence and heavy 



tails (see Section 2.1). The independence of the X,(£)'s in j is a questionable 
assumption when the network is not in equilibrium or it is congested. Indeed, 
if two routes share a congested node, then the feedback mechanism of TCP 
clearly induces dependence between the two flows. Further, since every TCP 
session involves ACK (acknowledgment) packets traversing along the reverse 
route, then in practice one expects dependence between the forward and reverse 
flows for a given pair of a source and destination. Our experience with NetFlow 
data for the Internet2 backbone network suggests however that for the present 
utilization levels of the network (about 10% to 20%) the X,-(£)'s are nearly 
uncorrelated in j (see Fig. [2] in Section [5]). The correlation is strongest but still 
rather weak among the forward and reverse flows (see also [3TJ). 

Therefore, as a first attempt to model globally the dependence structure 
of the network across all links and in time, we advocate adopting the simple 
assumption of independence of the flow-level traffic. Our methodology can be 
extended to cover more complex scenarios of dependence between forward and 
reverse flows, as well as 'second-order' effects of dependence between flows trig- 



gered by congestion. This should be done with caution however since the chaotic 
behavior induced by the TCP feedback is not well-understood on network-wide 
level. 

2.1 A Brief Overview of Single Link Traffic Models 

We start with a brief review of single-link traffic models, since a number of 
their features are incorporated into our network-wide model. Such models 
are built on the paradigm of multiple users sharing a link. Depending on 
the regimes prevalent in the network, one obtains two qualitatively different 
asymptotic models for the cumulative traffic fluctuations. One regime leads to 
finite-variance, Gaussian models that exhibit long-range dependence and self- 
similarity. The other regime yields infinite variance processes with independent 
increments. 

2.2 Activity rate models: two limit regimes 

Consider a fixed route on the network and suppose that M independent users 
share this route. Let {X(t)} t >o denote the traffic intensity of one such user in 
bytes per unit time. Thus J X(t)dt is the total traffic (bytes) generated by the 
user during the time interval (a, 6). It is assumed that {X(i)}t>o is a strictly 
stationary stochastic process with finite mean. 

Following the framework in |24j . let {(Tj, Zj)}j£% be a stationary marked 
point process of arrival times Tj's in K with marks Zj's. At time Tj, the user 
initiates a transmission at constant unit rate, which lasts time Zj. Thus, the 
traffic intensity at time t equals: 

X(i)=^l(T ; ,<i<T,+Z i ), (2) 

where • • • < Tq < < T\ < • ■ ■ . One can recover the following two popular 
traffic models as special cases: 



• 



M/G/oo model: If the Tj's are arrival times of a Poisson point process 
with constant intensity, independent of the marks Zj'a, then {X(t)} t >o 
becomes the M/G/oo model. 

On/ Off model: On the other hand, if the Zfs and the X^-'s are dependent 
and such that: 

Tj l0n '•= Tj, 7}, off := Tj + Zj < Tj + \ = Tj + i, on , 

then X(t) follows the On/Off model, i.e. a period of activity ('On') is fol- 
lowed by an idle period ('Off'). It is further assumed that the On periods: 
Uj,on '■— Zj and the Off periods C/j, ff := ^j+i — Tj + Zj, arc mutually 
independent and identically distributed with laws F on (x) = P{f/i i0n < x} 
and F oS (x) = P{J7i,off < x}. 



Remark. The initial On period Ti i0n has such a distribution as to ensure that 
{X(t)} is stationary. This can happen only if the On and Off periods have finite 
means. The work [23] addresses the case of activity rates with very heavy tails, 
which can have infinite means. 

In the context of network traffic, the durations of the user activity Zj's are 
modeled with heavy tailed distributions with finite mean but infinite variance, 
since they can be linked to the ubiquitous presence of heavy tails in computer 
networks (file sizes, web pages, etc. see e.g. [2 [231 IB])- The heavy tailed nature 
of the durations, implies that the process X(t) of user activity is long-range 
dependent (LRD). The intimate connection between the long-range dependence 
phenomenon and self-similarity provides an appealing mechanistic (physical) 
explanation of the cause of burstiness in network traffic (see e.g. [201 H2 HO] ; 
[26] and the references therein). 

For brevity, we focus on the On/ Off model and suppose that the tails of the 
On and Off durations are heavy, i.e. as x — > oo: 

I - F on (x) := F on (x) ~ c on x~ a ° n and F s(x) = c oS x~ a ° !l , 

for some constants c on , c s > and tail exponents, such that 

1 < a := min{a on , a g } < 2. (3) 

Relation (pjl) then implies that X(t) is LRD with Hurst exponent 

H=^e (1/2,1), (4) 

that is, for some constant ex > 0, 

Cav(X(t),X(0)) - c x t 2H - 2 , sst^oo 
(see e.g. [2"1]). 

2.3 Multiple Sources Asymptotics: Long— Range Depen- 
dence and Heavy Tails 

Let now {X™(t)}, 1 < i < M be independent and identically distributed sta- 
tionary processes modeling the traffic intensities of M users sharing a given 
route. Then, the cumulative traffic over the route generated by the users is: 

x m 



X*(T,M) := f y]X®(jk)dt 



We are interested in the asymptotic behavior of the cumulative traffic fluctua- 
tions about the mean: 

X * (T, M) := X* (T, M) - EX* (T, M). 



As shown in the seminal work of [35], if the -X"W(t)'s are On/Off processes, 
then 

£ lim -^{C lim J=X*(Tt,A/)} = {B ff (t)} t > , (5) 



M 

where Bh = {Bfi(t)}t>o is a fractional Brownian motion (ffim) with self- 
similarity parameter H as in Q and where '£ lim' denotes finite-dimensional 
distributions convergence. Recall that the ffim Bh is a zero mean Gaussian 
process with stationary increments, which is self-similar, i.e. for all c > 0, we 

have {Bh (ct)} t >o — {c H Bh (t)}t>o- One necessarily has that H £ (0,1) and, 
for some a 2 — Var(Sjj(l)) > 0: 

Cov(B H (t),B H (s)) = ^ (\t\ 2H + \s\ 2H -\t- s\ 2H ) ,t,8>0 

(see e.g. [3U]). 

Relation ^ shows that the fluctuations of the cumulative traffic about its 
mean behave asymptotically like the fractional Brownian motion, as the number 
of users M and the time scale T are sufficiently large. The increments G(k) := 
Bn{k) — Bn(k — 1), k = 1, 2, . . . , of ffim then can then serve as a model for the 
traffic traces of the number of bytes transmitted over the network over certain, 
sufficiently large time scales. 

The order of the limits in |5]) is important. If one takes T — > oo first and 
then M — > oo, as shown in [38], one obtains: 

Now the limit process A a = {A a (t)}t>o has independent and stationary incre- 
ments with a— stable distributions, with a being as in Efy. It is the Levy stable 
motion - the infinite variance counterpart to the Brownian motion. 

Relations (f5j) and |6| show two different regimes for the network. The first 
involves many users relative to the time scale and the second, just a few users 
relative to the time scale. As shown in [53] (see also [HI [2HJ [H]), one can 
consider the limit when the number of users At = M(T) grows to infinity, as a 
function of the time scale T. Then: 

• (fast growth) If (M (T)T) 1 / Q /T -> oo, as T -> oo, then 



{^Wm X o(Tt,M)} = { B H (t)} t> _ . 



C lim , ,. | , , , , 

T^oo lT H y/M(T) it>o 



(slow growth) If (M(T)T) l / a /T -> 0, as T ->• oo, then 



I m-^ ^^l,^^^ 



C t™oo l(TM(T))i/«- uv --'-'J t > 



The fast growth scenario shows that if the number of users M(T) grows 
relatively fast, then the same limit as in (;5) is achieved. The slow growth 
regime on the other hand, yields the stable Levy motion in the limit, when 
there are relatively few users sharing the link. The intermediate regime when 
(M(T)T) x / a /T -> c G (0, oo) is considered in [H]. 

This abundant theory offers a multitude of tools for modeling the temporal 
dependence of traffic traces in various regimes. For example, the users need not 
be of the same type. As in [9] one may consider q classes of users Mk, 1 < k < q, 
where M = J2l=i ^fci an< ^ Mk(T) — > oo as T — > oo. The users within a given 
class are of the same type with parameters a^ E (1,2), and.Hfc := (3— afc)/2, 1 < 
k < q. By balancing the rates of the Mfc(T)'s one can obtain in the limit 

C T \imjJ-X*(Tt 7 M)} t > = Yu Bh * + E A <**> 
^ ' kef kes 

where B Hk = {B Hk (t)} t > and A ak = {A ak (t)} t > a , 1 < k < q are independent 
fBm's and Levy stable motions, respectively. Here {1, ■ • • , q} = J- U S is the 
partition of the groups of users into subsets of fast and slow growth regimes, 
respectively. 

Similar results were shown to hold for the M jG /oo and other activity rate 
models (see e.g. [21] )• Remarks. 

1. If the individual user behavior is modeled by M/G /oo processes with 
heavy-tailed, infinite variance durations Zj's, then similar asymptotic re- 
sults hold for the cumulative traffic fluctuations. In fact, as shown in [23] , 
this is so for the general activity rate model in S. 

2. As argued above, by balancing the rates of multiple groups of users, one 
can obtain complex hybrid models, composed of fBm's and Levy stable 
motions. In practice, however, typically one component dominates. In 
fact, as shown in [24], the fBm limit is more robust than the Levy stable 
motion with respect to the type and the regimes of the activity rate models 
considered. 

In fact, the fundamental theorem of Lamperti (see eg Theorem 2.1.1 in 
|llj ) implies an interesting robustness and homogeneity property. Namely, 
suppose that XW(£)'s are all stationary in t. If the the time-scale limit 

£ l im {-LXo(?t,M)h>o = m,M)}t>o, 

T->-oo a(l ) ~ ~ 

is non-trivial, then it is necessarily self-similar. That is, {£(c£, M )}t>a = 
{c H £(t, M)}, for all c> with some H > 0. 

This implies that if the number of users M is either fixed or already large 
enough for the Gaussian asymptotics to hold, then the time-scaling limit 
is necessarily either a single Levy stable motion, or a single fractional 
Brownian motion. 



Thus the complex hybrid models involving sums of multiple fBm's and 
Levy stable motions are rather fragile. That is, they may occur only if 
a careful balance between the rate of growth of the users and the time- 
scale is imposed. In reality, the single-fBm and single-Levy stable motion 
provide good, first-order limit approximations of traffic fluctuations that 
remain valid under changes of time-scales. 

This observation is the reason why we advocate studying first the simpler, 
self-similar models involving either a single fBm or a single Levy stable 
motion. Accounting for the hybrid models involves careful considerations 
of time-scales, which presents formidable statistical challenges. 

3 Network— Wide Traffic Modeling 

3.1 Asymptotic Approximations 

As discussed in the introduction, we assume that traffic is fluid and it propagates 
instantaneously through the network so that the routing equation (fTl) holds. As 
in Section 2.1 we model the traffic intensity Xj(t) over route j as a compo- 



sition of Mj independent users. We suppose, in addition that the A^(t)'s are 
independent in j and composed of Mj independent and identically distributed 
(i.i.d.) On/Off sources: 

X J (t)=J2xf\t), 1<3<J. (7) 

4=1 

We then obtain the following results: 

Theorem 1 Let Xj(t) 's be as in frfh, where the On/Off components have com- 
mon parameter a as in pi). Suppose that Mj ~ r(j)M, M — > oo, for some 
constants r(j) > and let Y{t) be as in (fTl). Then, 

C lim C lim — / (YM - EY(0))dT 

T->oo M^ooT H ^MJo 

- {AB H (t)} t > , (8) 

where Bu{t) = (r(j)B^ (t))i<j<j and B^ (t) 's are i.i.d. fBm's with parameter 
ff = (3-a)/2G(l/2,l). 

Theorem 2 Under the conditions of Theorem^ we have 

1 " Tt 



C lim C lim —. — - / (Y(r) - EY(0))dr 

= {AA a (t)} t > 0l (9) 

where A Q (i) = (r(j)Aa (t))i<j<j an d As (t)'s are i.i.d. Levy a—stable mo- 
tions. 



Theorems [T] and [2] correspond, respectively, to the fast and slow regime 
asymptotics in the single-flow case. Their proofs follow readily from the well- 
known single-flow results with an application of the continuous mapping theo- 
rem. 

3.2 A representation via functional Levy and functional 
fractional Brownian motions 

In this section, we introduce two classes of stochastic processes, indexed by 
functions, which can be used to succinctly represent the limit processes arising 
in Theorems [T] and [2] The purpose of this more abstract treatment is to develop 
tools and insight that can be used in statistical inference for the network models. 
Functional fBm: Consider a measure space (E, /1) and the set of functions 

L 2H (^) = {f:E^R, 11/111* := f \f\ 2H d^ < 00}, 

Je 

where H € (0, 1). Introduce the functional 



a 2 



4>2 H {f,g) ■= y (11/11$ + \\g\\Z 11/ - g\\i%), (10) 

for f,g € L 2H ([i) and a > 0. 

The functional (/, g) 1— > cj>(f, g) resembles the auto-covariance function of an 
fBm. It turns out that <j>(f,g) is positive semi-definite (see Proposition p] in the 
Appendix). One can thus define a Gaussian process with covariance <f>2H' 

Definition 1 LetO < H < 1. A zero mean Gaussian process B — {_B(/)}y ei 2H^,\ 
indexed by the functions f G L (/i) is said to be a functional fractional Brow- 
nian motion (f-fBm), if: 

Cov(B(f),B(g)) =EB(f)B(g) = fo H (f,g), f,geL 2H (fi). 

It turns out that the limit process in Theorem [l] can be expressed in terms 
of a functional fBm. Indeed, let E — {1, • • • , J} and let the measure [i be the 
counting measure on E. Consider the f-fBm B = {-B(/)}/gL 2if (/i)- 

Proposition 1 For the limit process in (|8J) , we have 

{AB H (t)}t>o = {(B(tf e )) 1<e<L } . 

Here fi(u) = r(M) 1 / i? l J 4 f (M), where Ag c {1, • • • , J} denotes the set of routes 
that use link £, 1 < I < L and r(u), 1 < u < J is as in TheoremYn 

The proof is given in the Appendix. The next result shows the basic properties 
of f-fflm's. 



Proposition 2 Let H <G (0, 1] and B = {B(f)}f £L 2H r^ be f-fBm. 
(i) The process B is H ~ self-similar: 

{B(cf)} feL 2H M ±{c H B(f)} feL2HM , (Vc>0), (11) 

where = denotes equality of the finite-dimensional distributions, 
(ii) The process B has stationary increments: 

{B(f + h)- B(h)}fei**M = {B(f)} feL 2H M , (12) 

for all he L 2H (fi). 

(Hi) If fg = ix— (i.e., then B(f) and B(g) are independent. 

(iv) Bf(t) := B(tf), t G R is an ordinary fBm process. 

(v) If H 7^ 1, then B(f) + B(g) = B(f + g). almost surely, if and only if fg = 

0, /i—a.e. (Note that by (ii) above, we always have B{f) — B(g) = B(f — g).) 

The proof is given in the Appendix. 

Now, to gain more intuition behind the role of f-fBm in representing the 
limit process in Theoremfll suppose that r(j) — 1 therein, i.e. all routes involve 
the same number of users Mj — M. Consider the random variables B{tfi x ) and 
B(sfe 2 ) representing the asymptotic cumulative fluctuations of traffic over links 
l\ and £2 respectively. Since fi = \a ( is merely an indicator function, we have: 

EB(tf ei )B(sf i2 ) = ~(\t\ 2H »(A ei ) + \s\ 2H n(A e2 ) 
-\t- s\ 2H tx{A^ n A l2 ) - \t\ 2H \x{A lt \ A l2 ) 
-\s\ 2H n(A t2 \A ei )) 

= ^(A ei n A i2 )^(\tf H + \s\ 2H - \t - s\ 2H ). (13) 

Recall that Ai C {1, • • • , J} is the set of all routes that involve link I. Thus, the 
last relation has the following natural interpretation. The spatial dependence 
between the links t\ and £2 is governed solely by the routes they have in common, 
i.e. the set Ai x n Ag 2 . On the other hand, the temporal dependence follows the 
fBm model. In particular, B(tfg, x ) and B(tfi 2 ) are independent if and only if 
links £\ and £2 have no common routes, i.e. fi(A£ t n A^ 2 ) = 0. 

Functional Levy stable motion: As for f-fBm, consider the measure space 
(E,/j,) and the set of functions L 1 (fi). 

Definition 2 Leta*E (1,2). Consider a zero-mean a— stable measure M a {dx,du) 
onM. x E with control measure dx x /i(du) (see 1301). Let 

Hf) : = / \^(-oo.f(u)}{x)-l(-oofl){x))M a {dx,du), 

JRxE v ' 

for any f e L (fj,). The process {A(/)}/ g ii( (ti ), indexed by the functions f € 
L a (n) is said to be a functional Levy stable motion (f-Lsm). 



10 



As for f-fflm, we have: 
Proposition 3 For the limit process in Q, we have 

{AA a (t)} t>0 = {(A(tf e )) 1<e<L } . 

~ I J i>0 

Here fe(u) = r(u) 1 ' Q lA £ ( - u), where An C {1, • • • , J7} is the set of routes involving 
link £, 1 < £ < L and r(u), 1 < u < J is as in Theoremyn 

The properties of the f-Lsm parallel those of f-fBm. For example, the process 
{A(t/)} f >o is a Levy stable motion. 

Proposition 4 Let a £ (1,2) and {A(f)} f^^ii^ be a junctional Levy a-stable 
motion. We then have: 

(i) The process A is 1/a self-similar: 

{A(cf)} feLl(p) £ {c^ a A(f)} feLlMl (Vc > 0). 
(ii) A has stationary increments: 

{A(/ +h)- A(h)} feLl{p) £ {A(/)} /eLV) , (W» G i 1 ^)). 

(m,) A(/) and A(g) are independent if and only if /g < fi—a.e. 

(iv) For all < /i < f% < • • • < /„ ('mod /ij and n£N, i/ie increments 

A(A), A(/ 2 ) - A(A), • • • , A(/„) - A(/ n _!), 

are independent. 

fyj A(/ + .g) = A(/) + A(g), if and only if fg = ^-o.e. 
(t>z) {A(tf)} t >o is an ordinary Levy a-stable motion. 

The proof is given in the Appendix. 

Remark: Here, for simplicity, we focus only on the case a £ (1,2), where the 
mean of M a is finite and set to zero. Implicitly, the skewness coefficient function 
is assumed to be constant. The functional Levy stable motion can be defined 
for all a £ (0, 2), provided that the random measure M a is strictly stable with 
constant skewness intensity function. For example, the symmetric a— stable 
case is particularly simple. For more details of a— stable stochastic integration, 
see eg [50] , 

Integral representation of f— fBm: The explicit representation of the f-Lsm 
processes suggests that the f-fBm may be also conveniently handled through 
stochastic integrals. Indeed: 
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Proposition 5 For all H £ (0, 1), we have that: 

B(f) := / ((/(«) - xf + - 1/2 {-xf + - 1/2 ) W(dx, du), 



is o functional fBm, where W{dx, du) is a Gaussian random measure with con- 
trol measure dx x [i(du) and (x) + := max{a;,0}. 

The proof is given in the Appendix. 

The last representation provides further tools as well as intuition into the 
nature of the f-fBm. Indeed, suppose that E is discrete. Then, W{dx,{u{\) 
and W{dx, {^2}) are independent Gaussian measures on K., for u\ 7^ u-i. Thus, 
the stochastic integral over E becomes a sum of independent processes, each of 
which has the form of a fractional Brownian motion. That is, 



B(f) = J2 Bi H (/(«)). 



ueE 

where {B^ (i)} tS R are i.i.d. fBm's indexed by u £ E. 

Thus, the functional fBm may be viewed as a suitable, infinitesimal sum 
of independent fBm's each indexed by the corresponding values f{u) of the 
functional argument /. This is essentially why the f-fBm provides a succinct 
representation of the limit process in Theorem [Tl 

In the next section, we utilize the simple parametric form of the limit ap- 
proximations to solve the network kriging problem. 

4 An Application to Network Kriging 

In view of Theorems [T] and [2] one can model the joint distribution of the traffic 
traces Y^(t), 1 < £ < L as increments of functional fBm or functional Levy 
stable motion. Here, we focus on the fast regime, where according to Theorem 
[TJ the traffic traces are approximated by Gaussian processes. 
Consider the traffic time series 

rk6 



Y 5 {£,k):= / Y e (t)dt, fe = l,2,--- 

J(fc-1)<5 



of the number of bytes traversing link £ during the fc-th time interval ({k — 
1)5, kS), for a fixed time scale S > 0. Guided by the multiple sources asymp- 
totics, let B{f), f £ L 2H (E, ^i) be an f-fBm, where E = {1, • • • , J} and [i is 
the counting measure on E. Set, 

Y 5 {t,k)-- l x Y {t)+B{kf i )-B{{k-l)fe), fc = l,2,... , 

where fe(u) — r(u) 1 / ff 1a £ (u), u £ E = {1, • • • ,J} and Ag is the set of all 
routes using link £. Here [Iy(£) — EYi(£, k) is the traffic mean over link £. 
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Assuming that the mean structure fly = (fJ>Y(£))i<e<L an d the parameters 
H and r(u) of the limit f-fBm model are known, then one recovers the joint 
distribution of the traffic load on the network across all links £ and time slots 
k. This allows one to address a number of fundamental statistical problems. 

Instantaneous prediction (network kriging): Observed are the traffic loads 

V:={Y(£,t), l<t<t , £eO}, (14) 

over the set of links £ € O C {1, • • • , L} at time slots t, 1 < t < to- Predict the 
traffic load Y(£o,to) on a unobserved link £q, in terms of the data T>. 



Spatio-temporal prediction: Given the data V in (14 1, predict the traffic 
loadY(£,to + h) on a observed or unobserved link £, at some future time to + h > 
t. 

Remarks: 

1. The estimation of the Hurst parameter H is a well-studied problem (see 
e.g. j37j[U|3l[35l|33].) We advocate the use of robustified wavelet methods 
to obtain H in practice (see e.g. [Tl l3"2"l I3"4"l 1551 155] .) 

On the other hand, the estimation of the mean structure fly — (hy(£))i<i<l, 
and the underlying parameter r(u), 1 < u < J in the covariance structure 
are important and challenging problems in practice. We address these 
problems in a general statistical framework with the help of latent models 
and auxiliary NetFlow data sets in the forthcoming work |39j . 

2. In the interest of space, we focus only on the first, instantaneous prediction 
problem. The /i-step prediction problem can be addressed similarly (see 
e.g. |Sg|.) 

We refer to the instantaneous prediction as network kriging because of 
its resemblance to geostatistical prediction problems. The term network 
kriging was introduced first, to best of our knowledge, by Chua, Kolaczyk 
and Crovella in 5 in the context of predicting eg delays along routes from 
active network measurements of flows in the network. Here, our setting is 
different since the focus is link rather than flow measurements. 

For simplicity, let S = 1, time t € N be discrete, and (with some abuse of 
notation) 

?(t) := (Y s (l,t))i<t< L , t = l,2,.... 

Partition the vector Y(t) and the rows of the routing matrix A into two com- 
ponents, corresponding to the indices of the unobserved ('u') and observed ('o') 
sets of links: 

*«>-(£$) ™ d A =(t 

Proposition 6 Let Y(t) = AX(t), where EJ?(0) = [i x , and E x := E(X(0) 
fix){X(Q) — fix) ■ Suppose that the matrix AgllxA^ is invertible. Then: 
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(i) The statistic 

Y u (t ) = A uf i x + A^xAKAo^xAD-^Yoito) ~ A of i x ) (15) 



is a unbiased predictor for Y u (fo) in terms of the data T> in (14). The mean- 
squared error (m.s.e.) matrix ofY u (to) is: 

m.s.e.{Y u (t Q )\V) (16) 

:= E((F u (io) - Y u (t ))(Y u (t ) - Y u {t )) l \v) 
= A a Z x Ai - AuZxAKAoZxAD^AoZxAi, 

where the last expectation is conditional, given the data T>. 

(ii) The statistic Y u (to) in ( |15| is the unique best unbiased m.s.e. predictor of 
^u(io) i n terms of the data T> in (|14[). That is, for any other unbiased predictor 
Y*(to), we have that 

m.s.e.(Y u {t )\V) < m.s.e.(F M *(t )P) (17) 

where the last inequality means that the difference between the matrices in the 
right- and the left-hand sides is positive semidefinite. 

The proof is given in the Appendix. We now make a few important observations. 
Remarks: 



1. If Y(t) is non-Gaussian, then the estimator in ( |15[ ) remains the best linear 
unbiased predictor (b.l.u.p.) of Y u (to) in terms of the data T>. Relations 



(16) and (17) continue to hold, where now Y*(to) is an arbitrary linear in 



V, unbiased predictor of Y u (to) 



2. By Gaussianity, it is easy to see that Y u (to) in (15) also maximizes the 
conditional likelihood of Y u (to), given the data. 

3. Note that only the observations Y (to) at the present time £q are involved 



in (15). This is due to the product form of the space-time covariancc 
structure of the functional fBm (13) and Proposition [9] below. 



4. If the matrix A Tix-^ IS singular, then one can replace the inverse in ( 15 1 



and ( 16 1 by the Moore-Penrose generalized inverse. This corresponds to 
focusing on the range of Aq'ExA^ where the latter matrix is invertible. 
The statistic Y u (to) remains the b.l.u.p. In practice, A^xA^, is singular 
only when the traffic over a link is a perfect linear combination of the 
traffic over another set of links. This occurs in tree-type topologies, for 
example, where the internal nodes do not generate traffic. 

5. In the slow regime (Theorem^]) the functional Lsm infinite variance model 
for Y(t) should be used. The prediction problems can then be also ad- 
dressed but not with respect to the square loss. One can consider mini- 
mizing E\Y u (to) — Y u (to)\ p for p < a or, equivalently, the scale coefficient of 
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the a-stable variable. In this case, no closed-form solutions are available 
but one can obtain numerical expressions for the best linear predictors. 
Our experiments indicate that the coefficients of these linear predictors 



are often very close to those of the least squares predictor in ( 15 ) 



The fact that the b.l.u.p. Y u (to) in Propositionkjldoes not depend on the past 
data Y (t), t < t shows that the Y u (t ) is in fact the standard kriging predictor, 
which is well-studied in spatial statistics (see eg [5]). We shall therefore refer 
to Y u (to) as to the standard network kriging predictor. 

The following result provides the general solution to /i-step prediction prob- 
lem. We start by introducing some notation. Consider the Toeplitz matrix: 

r m+ i := (7x(|«- j|))o<*j<m 
and the vector j m+1 (h) = ( r fx(h + j))o<j< m , where 

7*(fc) = y(>+ir+ifc-ir-2|fcr 

Since 7x(0) — a 2 > and jx(k) — > 0, as k — > oo, the matrix T m+ i is invertible, 
for all m 6 N (see eg Proposition 5.1.1 in 4J). 

Proposition 7 Assume the conditions of Proposition p| Let \ii — Aifix — 
EYi(t), i€{'u','o'}. 
(i) The statistic 

m 

Y o (t + h):= f i + Y / Cj{h)(Xo(to - j) - Mo), (18) 

is a unbiased predictor ofY (to + h), h>l viaV, where c(h) = (cj(h))o<j< m — 



■ m-\- 



ilm+i{h). The m.s.e. matrix ofY (to + h) is then: 



m.s.e.(r o (i + h)\V) - a 2 {h)A o T. x A t 0i (19) 

where a 2 (h) := 7 X (0) - SQifT m+ ic(h) = 1 - j m+ i(h) t T^- +l 'j m+ i(h). 
(ii) The statistic 

Y u (t + h) := M „ + C(Y o (t + h) - Ho) (20) 

is a unbiased predictor of Y u (to + h) via D, where C = AnEx^^oSyi^)" 1 
and Y (to + h) is as in n\M. The m.s.e. matrix ofY u (to + h) is: 



m.s.e.(Y u (t Q + h)\V) = o 2 {h)CAj: x A t C t + m.s.e.(Y u (t )\V), 



where C is as in ( |20| and where m.s.e. (Y u (tQ)\T>) is as in (16). 

(Hi) The statistics in (i) and (ii) yield the best m.s.e. predictors in the sense 
of Proposition^ (ii). IfY(t) is non-Gaussian, then these predictors are b.l.u.p. 
in terms of the data T>. 
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Figure 1: The Internet2 backbone network consists of 9 nodes and 26 one-directional 
links. All links have capacity of 10 Gbs/s, with the exception of the links: Chicago- 
Kansas, Kansas-Salt Lake City, New York-Washington, and Washington-Atlanta, all 
of which have doubled capacity of 20 Gbs/s in each direction. 

The proof is given in the Appendix. 

The above results provide, in principle, complete solutions to the kriging 
and the /i-step prediction problems outlined above. The underlying mean 
fiy = A/ix, spatial Sy = AY^xA 1 and temporal covariance structure, however, 
involves unknown parameters. Moreover, their estimation from link measure- 
ments is impossible, without network-specific regularity conditions, since the 
number of links is typically much smaller than the number of routes (L << JJ). 
In |39| , we focus on designing suitable latent models for the unknown means and 
covariances with the help of auxiliary NetFlow data on the route-level traffic. 
These models will involve a few parameters that can be successfully estimated 
from link measurements. 

5 Analysis of Internet 2 Data 

Here, we will first demonstrate the validity of our probabilistic models by using 
real network data. We will then illustrate the performance of the standard 
kriging predictor in practice. 

NetFlow data: We obtained from [17], sampled measurements of all packets 
traversing the Internet2 (12) backbone network (see Fig. [I]). These data were 
used to reconstruct sampled versions of all flows Xj(t), 1 < j < J = 72 in 12. 
Packet and bytes traces over the 100 millisecond time scale were then obtained. 
The routing matrix A for the 12 network was deduced from these NetFlow data 
sets as well and it was found to be constant in time. 

Computationally intensive processing is required to obtain the flow level data 
in practice. Therefore, these data cannot be used directly for fast on-line pre- 
diction of traffic. Nevertheless, we utilize this information to validate the main 
assumptions in our models. Fig. [2] indicates for example that the X/(i)'s are 
nearly uncorrelated in j, which supports the simplifying independence assump- 
tion. On the other hand, the wavelet spectrum of a typical flow indicates that 
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Figure 2: Correlation matrix (absolute values) of the flow-level traffic computed 
from a typical hour-long traces (in bytes per 10 sec). The traces are deduced from 
NetFlow measurements on the Internet2 backbone on Feb 19, 2009. Brighter shades 
indicate numbers close to 1. 



Xj (t) is well-modeled by a fractional Gaussian noise time series for a wide range 
of time scales (see Fig. [3]) Further, barring a few anomalies in the NetFlow data, 
the Hurst exponents along most routes were found to be approximately equal 
(within statistical significance). These observations (along with NS2 simulation 
experiments, not discussed here due to lack space) support the overall validity 
global functional ffim model for the cumulative traffic fluctuations. 

Traffic traces: As indicated above, the NetFlow measurements cannot be used 
directly to readily predict the link loads in real time. We acquired from |17) 
time-synchronized traffic traces of packets and bytes on the 10-second time 
scale, for all links in the Internet2 backbone network. As expected, since RTT 
<< 10 sec, the routing equation (fTl) can be safely assumed to hold for the time 
scales of interest. By using coarse-scale information obtained from the corre- 
sponding NetFlow data, we approximated the mean \xx and variance structure 
of the X,(£)'s. Thus, by using fix and Yjx '■= A\a,g(a\.. 1 < j < J\ we ob- 
tained from ( |15[ ) the standard kriging estimator for a number of scenarios with 
observed and unobserved links. 

Figs [4] and [5] demonstrate the success of our global modeling strategy in the 
context of network kriging. By monitoring just a few links, with the help of 
the standard kriging estimator described above, one can track relatively well 
the traffic load on other links. Table [T] shows further that a given link can be 
relatively well predicted from measurements of as few as two other links. The 
results also show that the choice of which set of link to monitor is an important 
design problem. 

There are, however, objective limitations to the degree to which one can 
predict unobserved links from another set of links. The example in Fig. [6] was 



17 



!f 10 7 


Rout 


3: ATLA to N 


EWY 












25 










-I 20 






























^ 15 

10 



i - I [ - run H y i = 6) = 98422 



5 10 15 20 

Time in Hours (GMT) 



Link: WASH to NEVVY 




5 10 15 20 

Time in Hours (GMT) 




Figure 3: Top-left & right a flow-level traffic trace (bytes per 100 msec) from 
Atlanta to New York for March 17, 2009 and its wavelet spectrum, which yields a 
Hurst parameter H ~ 0.98. Bottom-left & right: a link-level trace (bytes per 10 sec) 
for the Washington to New York link for March 17, 2009 and its wavelet spectrum 
with H ~ 0.99. Observe the similarity between the diurnal patterns and the Hurst 
exponents for the flow- and link-level data. The linearity of the wavelet spectra 
confirms the fractional Gaussian noise model. 



chosen to illustrate these limitations. Observe that even though the coarse-scale 
traffic mean is tracked somewhat, the standard kriging estimator fails to track 
the finer scale behavior and the prediction intervals are rather wide. 

Fig. [7] shows that network kriging may be used in anomaly detection as a 
diagnostic device. Namely, one can predict an observed link from a set of other 
observed links and thus obtain a two-sided p-value based on the prediction 
distribution. Low p-values would indicate the presence of an anomaly. This is 
well demonstrated in Fig. [7] by drop in p-values between 5am and 6am GMT. 
The sudden peak load on the Chicago-Atlanta link is not tracked well by the 
monitored links and the underlying NetFlow data used to recover the overall 
mean and variance structure of the flow-level traffic. Thus, the network kriging 
methodology, based on our probabilistic model, provides an novel global view on 
the statistical significance of traffic patterns in the network. 

6 Discussion 

In this paper, we developed a probabilistic framework for network-wide model- 
ing of traffic, based on multiple sources and large time-scale limit approxima- 
tions. It is shown that depending on the scaling, a fast and slow regime occur 
in the limit. As an extension, one can also consider simultaneous limits as the 
number of sources M = M(T) and as the time scale T tend to infinity, as well 
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Figure 4: Prediction for the Internet2 backbone link Houston to At- 

lanta H0US->ATLA based on the links: SEAT->SALT, SEAT->L0SA, L0SA->H0US, 
ATLA->WASH, CHIC->NEWY. The traces reflect an entire day of activity (February 19, 
2009). Observe the diurnal patterns and the utilization (see the caption of Fig. 111. 
The dotted lines indicate 95% prediction bounds. 
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Figure 5: A zoomed-in portion of Fig. El Observe that the standard kriging estimator 
closely tracks the true traffic trace and note that the prediction bounds are adequate. 
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Figure 6: Prediction for the Internet2 backbone link Chicago to Washington based 
on the same set of observed links as in Fig. H] 
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Figure 7: Top panel: standard kriging for CHIC->ATLA via the same set of observed 
links as in Fig. p] Bottom panel: P-values corresponding to the top panel. 
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Number of links 


Link labels 


Relative m.s.e. 


2 


3,7 


0.07 


2 


7,9 


0.12 


2 


9,12 


0.08 


2 


12,17 


0.41 


2 


17,21 


3.06 


2 


3,21 


0.05 


3 


3,7,9 


0.12 


4 


3,7,9,12 


0.08 


5 


3,7,9,12,17 


0.07 


G 


3,7,9,12,17,21 


0.06 


8 


3,5,7,9,11,12,17,21 


0.06 


10 


3,5,7,9,11,12,17,21,23,25 


0.06 



Table 1: Empirical relative mean squared error for the standard kriging es- 
timator: (r.m.s.e.) = (£f =1 (F u (t) - K(t)) 2 )/(ELi Y u (t) 2 ). The Internet 2 
backbone link 13 (Kansas City to Chicago) was predicted from various sets of 
other backbone links. The data spans the entire day of February 19, 2009. The 
link ID's are described in Table [2j 

as other complex asymptotic scenarios. 

The proposed model proves mathematically tractable, involving few statis- 
tical parameters and therefore perfectly suitable for addressing a number of 
important questions for network-wide traffic behavior. As shown, the model 
can successfully predict traffic loads on unobserved links (network kriging) , em- 
ploying only a limited set of link measurements, provided that some coarse-scale 
information about the traffic means is available (e.g. through NetFlow data). 

The developed network kriging methodology has further applications to 
anomaly detection and diagnostics, as shown in the example of Fig. [7] Since the 
model captures the joint distribution of all links in the network, the multiple 
testing problem associated with anomaly detection for a large number of links 
can be successfully handled, as well. Further, as illustrated in Fig. [6] and Table 
[TJ in the presence of limited resources, it is important to select an "optimal set" 
of links for network monitoring; this model can be used to address this problem 
in the context of network kriging. 

Finally, estimation of the joint distribution of means and covariances of 
traffic flows, across time and over the network, constitutes a challenging, but 
also important problem for network engineering. Our ongoing work is addressing 
this problem through flexible, parsimonious latent variable models that can be 
estimated in real time and without the need for the availability of NetFlow data 
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ID 


Source Destination 


Cap. 


ID 


Source Destination 


Cap. 


1 


Los Angeles-Seattle 


10 Gb/s 


2 


Seattle-Los Angeles 


10 Gb/s 


3 


Seattle-Salt Lake City 


10 Gb/s 


4 


Salt Lake City-Seatle 


10 Gb/s 


5 


Los Angeles-Salt Lake City 


10 Gb/s 


6 


Salt Lake City-Los Angeles 


10 Gb/s 


7 


Los Angeles-Houston 


10 Gb/s 


8 


Houston-Los Angeles 


10 Gb/s 


9 


Salt Lake City-Kansas City 


10 Gb/s 


10 


Kansas City-Salt Lake City 


10 Gb/s 


11 


Kansas City-Houston 


10 Gb/s 


12 


Houston-Kansas City 


10 Gb/s 


13 


Kansas City-Chicago 


20 Gb/s 


14 


Chicago-Kansas City 


20 Gb/s 


15 


Houston- Atlanta 


10 Gb/s 


16 


Atlanta-Houston 


10 Gb/s 


17 


Chicago- Atlanta 


10 Gb/s 


18 


Atlanta-Chicago 


10 Gb/s 


19 


Chicago-New York 


10 Gb/s 


20 


New York-Chicago 


10 Gb/s 


21 


Chicago- Washington 


10 Gb/s 


22 


Washington-Chicago 


10 Gb/s 


23 


Atlanta- Washington 


10 Gb/s 


24 


Washington- Atlanta 


10 Gb/s 


25 


Washington-New York 


20 Gb/s 


26 


New York- Washington 


20 Gb/s 



Table 2: A description of the 26 links that form the Internet2 backbone. Also 
provided are the id numbers used in this paper for notational simplicity. 
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Proposition 8 The functional (f,g) *-$■ <$>2H{f,g) i- n (10 1 is positive semi- 
definite if and only if < H < 1 . 

Proof: Since (t,s) i-> 4>2H(tfo,sfo), t, s € R has the form of the auto- 
covariance of fBm, then it follows that necessarily H £ (0,1] (see e.g. [30]). 
It remains to show that <f> a is positive definite for all a :— 2H £ (0, 2]. 

Let M a , a £ (0, 2] be an SaS random measure with control measure // and 
define 

A(/):= [ fdM a , Vf £ L a (p), 



to be the SaS integral of the deterministic function / (see e.g. Ch. 3 in [30]). 
Notice that for all Xj £ C, and fj £ L a (/j,), with 1 < j < n, we have 



3=1 



x,-x*.Ee iA(/j ~ /fc) 



j,fc=l 

n 

I *■ r I c* 

XjXke nJ * Jfcll ». 
j,fc=l 



Since the l.h.s. of the last expression is always non-negative, so is the r.h.s. This 
shows that the function r a (f,g) := e~H/~9lla ; f,g £ L a (ii) is positive definite. 

Now, the proof proceeds as the proof of the positive definiteness of the auto- 
covariance function of the fractional Brownian motion (see, e.g. p. 106 in [30 ). 
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Indeed, for all Xj <E C, and fj € L a 1 < j < n, and for all e > 0, we have 

n 

< Yl XjXke~ 4f ^ fkK (21) 

j,fc=0 

n n 

n 

=: Si + S 2 + S 3 + \x Q \ 2 

Since x and /o are at our disposal, let /o := and Xq := — Yl!j=i x j e ~ c ■ 
Observe that with this choice of xq and /o, we get 

n 

S 2 = S 3 = -\x \ 2 = - J2 xfre-MM-WMZ, 
j,k=i 

and therefore, Si + S 2 + S3 + \x \ 2 equals: 

n 

J2 XjxJe-^^-f"^ ~ e- 4f ^^-^ fkr A (22) 

n 

= e J2 ^*(ll/ill2 + IIMIS - II/j - MIS) + o(e), 

as e I 0, where the last relation we used the fact that e~ ca — e~ eb = e(6— a)+o(e), 
as e I 0. If for some x/s and //s we have £™ fc=1 a^kdl/jHS + \\fk\\a ~ 11/?' ~ 



/fellS) < 0' then, for all sufficiently small e > 0, the l.h.s. of (22) becomes 
negative, which in view of ( |21[ ), is impossible. This shows that <fi a is positive 
(semi-)definite. 

PROOF: [Proof of PrositionH] To check the equality in distribution of two 
zero mean Gaussian processes, it is enough to show the equality of their auto- 
covariance functions. Let 1 < ^1,^2 < L and ii,ta > 0. Then, by using the 
independence of the B^ (£)'s, we have: 

E(AB H (*i)k(^H(*2)k 

= X! r ( u ) 2l ^ 1 nA f2 (u)?'2if(ii,i2), (23) 

where r 2H {t u t 2 ) = (cr 2 /2)(|ti| 2ff + |£ 2 | 2ff - |£i-^| 2ff )- On the other hand, as 



in (13 1, we obtain 

2 
EB(t 1 f ll )B(t 2 f l2 ) = ^ J2 r(u) 2 (\ti\ 2H l Ali (i 

1<U<J 

+ \h\ 2H l At2 («) - |<ll^ («) - Ma, 2 («)| 2H ) , 

23 



which after cancellations, equals the r.h.s. of (23). 

PROOF: [Proof of Proposition [2] (i): The auto-covariance function of the 
process {B(cf)}f is 

4>2 H {cf, eg) = y (\\cff H + \\cg\\ 2H - ||c/ - cg\\ 2H ) , 
which equals E(c H B(f))(c H B(g)), where || • || stands for || • \\ 2 h- This implies 



g\\ 2H 



( 11 ). The proof of (ii) is similar. 

For simplicity, let now a = 1. Then 

E(B{f + h) - B(h))(B(g + h)- B{h)) 

= \(\\f + h\\ 2H + h + h\\ 2H -\\f 

-\{\\h\\ 2H +\\g+h\r-\\ g \r 
-\{\\f+h\r+\\h\r~\\j\r) 

+\\h\r = \(\\f\\ 2H +\\g\r-\\f-g\r 

which equals EB(f)B(f), and thus implies the equality of the finite-dimensional 
distributions in |12| ), 

(in) Since fg = jj,— a.e. 



Il/-.9l| 2ff = / \f\ 2H d»+ f \g\ 2H d» 

= \\f\\ 2H + h\\ 2H , 

where for simplicity, a = 1. This implies the independence of B(f) and B(g), 
since EB(f)B(g) = 4>2H(f,g) = in view of (pi). 



Part (iv) follows trivially from (10). Now, to prove (v), it is enough to show 
E(B(f + g) — B(f) — B(g)) 2 = if and only if fg = /i— a.e. It can be shown 
that the last expectation equals: 

EB(f + gf + EB{ff + EB{gf - 2EB(f + g)B(f) 
-2EB(f + g)B(g) + 2EB(f)B(g) 

= \\f + g\\ 2H + \\f\\ 2H + \\g\\ 2H 
-(\\f + g\\ 2H + \\f\\ 2H -\\g\\ 2H ) 
-(\\.f + g\\ 2H + \\g\\ 2H -\\f\\ 2H ) 
+\\.f\\ 2H + \\g\\ 2H -\\f-g\\ 2H 

= 2\\f\\ 2H + 2\\g\\ 2H -\\f-g\\ 2H -\\f + g\\ 2H . 

Since < 2H < 2, the last expression vanishes if and only if fg = /i— a.e. (see 
Eq. (2.7.9) in Lemma 2.7.14 of [50]). 
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PROOF:[Proof of Proposition [I] Let as in [3D], ||£|| Q denote the scale coeffi- 
cient of the a stable random variable £. To prove (i), it suffices to show that 
for all fj € L 1 ^), and 6j el, 1 < J < n, we have that 

ii E Wtfi)iis = iic 1/a E w^iis- 

The l.h.s. of this expression equals: 

/ I E 0J'( 1 (-°°.c/.;(u))( a: ) _1 (-oo,o)(»)) dxn{du). 

Jrxe i 1 < J -<„ 

By setting z := x/c, we obtain that the last integral equals: 

c / E j ( 1 (-°o, /.,(«)) (2) - l(-oo,o)(z)) dzn(du), 

JRXE ' n^.^_ 



l<i<" 



which is 

He 1 /- e 0i A (£)ii«- 

l<j<n 

This completes the proof of (i). 

Part (ii) can be established similarly by using the Fubini's theorem and the 
change of variables z := x — /i(u) in the integral 

/ I E 6 'j( 1 (-oo,/(«)+'i(«))( a; ) ~ 1 (-oo, /,(«)) (#)) dx/j,(du), 
Jrxe 1 1 < J < n 

wmch equals J|£ 1 < i < n i (A(/ + /i)-A(/ l ))||S- 

(m,): In view of Theorem 3.5.3 in |30| . A(/) and A(g) are independent if and 
onfo/ i/ 

(l(-oo,/(u))(») - 1 (-oo,0)(*))(l(-oo,5(«))(*) - l(-oo,0)(^)) = 0, 

for dx x n(du) almost all (x,u). By considering cases for the signs of f(u) and 
g(u), it follows that the latter equality holds (for dx x /j,(du) almost all (x, u)) 
if and only if f(u)g(u) < fi(du)-&.e. 
(iv): Let / := and observe that 



A(/ fc ) - A(/ fc _i) = / l Ak (x,u)M a (dx,du), 

JRxE 

where A& = {(x, u) : fk-i(u) < x < fk(u)}, for 1 < k < n. Again, by Theorem 
3.5.3 in |30j . we have that the above increments are independent, if and only if, 
the sets Ak, 1 < k < n are mutually disjoint (mod dx x dfi), which is clearly 
the case here. 

The proof of (v) is similar to that of (iv). 
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(vi): Follows (ii) and part (iv) applied to the independent processes A(i/ + ) 
and A(t/_), where f± = max{±/, 0} since A{tf) = A(t/+) - A(t/_). 

PROOF: [Proof of Proposition [5] To prove that EB(f)B(g) = cp2H(f,g), it 
suffices to show that 



Var(B(/) - B( 5 )) = E(B(/) - B(g)) 2 = * 2 \\f - g\\ 



2H- 



This is indeed the case: By using changes of variables and Fubini's theorem, we 
have that 

E(B(f) - B{g)f 

((/(«) - »)+ - (g( u ) - X )+~ 1/2 J dxfi(du), 



equals 



\f{u)- g {u)\ 2H - l ({l-xlc(u)) H + - 1 ' 2 

— (— ayc(u))_i_ J dxfi(du) 

a 2 j \f{u)-g{u)\ 2H ^du) = a 2 \\.f-g\ 



iHi 



where c(u) = f(u) — g(u) and a 2 = / K ((l — a;) + — {—x) + ) 2 dx. 

PROOF:[Proof of Proposition |6] Let /iy = EY(t ) = (/4, /U*)' an d 

E y = E(F(t ) - Hv)(Y(t ) - hyY = ( y uu y 110 ) , 

where Ejj = A;EjfA*, i,j G {o, u}. The conditional distribution of Y u (to)\Y (to) 
is Gaussian and: 

^(*o)|n(*o) ~ M"(j*u + EooE" 1 ^ - Mo), E uu - EuoE^Eou) (24) 

(see eg Theorem 1.6.6 in [4]). Thus, an unbiased predictor of Y u (to), given Y (to) 
is: 

Y u (t ) := E(Y u (t)\Y (t)) = (x u + E^E" 1 ^ - Mo ). (25) 

This implies (15) and (JIB]) with I? replaced by Y" (£o). Proposition [9] below 



implies, however, that Y u (to) — Y u (to) and 1^(£) are uncorrelated, for all tQ — m < 
t < to. This completes the proof of (%). 

To prove (ii), let be a constant vector of the same dimension as Y u (to). Con- 
sider the random variable £ := 0*3^i(to). It is well-known that E(£|Y" (£o)) is 
the best unbiased m.s.e. predictor of £ via Y (to). Thus 



e t m.s.e.(Y u {to)\Y {t ))6 < t m.a.e.(Y2(to)\Y o (to))O, 

which implies (ii) and completes the proof. 

The following result shows that if the space-time covariance structure of a ran- 
dom field factors, then the instantaneous standard kriging estimate is an optimal 
linear predictor even in the presence of additional data from the past. 
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Proposition 9 Let {^(t,x)}n x \(zxxs be a finite variance space-time random 
field. Suppose that E£(i, x) = 0, for all (t,x) £ T x S and that 

Cov(Z(t,x) t £(9,y)) = j(t,s)R(x,y), 

for all t,s £ T and x,y € S. Consider the data set T> — {£,(ti,Xj), < i < 
m, 1 < j < n} of observations of the random field at times to,ti, ■ ■ ■ ,t m and 
locations X\, ■ ■ ■ x n . Then, there exist coefficients ftj, 1 < j < n, such that 

n 

^(to,x )~J2^^ x o) ( 26 ) 

is the best linear in T>, unbiased predictor of £(to,Xo). In particular, we have 
/3 = Y,t c, where S to = (Cov(£(£ ,£j),£(io, Xj))) nxn (27) 

andc= (Cov(!;(to, xq), £,(to, Xi)))™ =1 . Here S7 denotes the Moore-Penrose gen- 
eralized inverse of the covariance matrix Y* to ■ 

Proof: Consider the Hilbert space C 2 of finite variance random variables 
with zero means and the usual inner product (£,77) := E£??. Consider the sub- 
space W — span(2?) < C 2 and observe that the best linear in T> unbiased 
predictor for £(£cb£o) is the (unique) orthogonal projection of £(to>£o) onto W. 

Let ^(toj^o) be the orthogonal projection of £(£o,a;o) onto the smaller sub- 
space span{£(£o, Xj), 1 < j < n}. We then have that, for all k = 1, • • • , n 

n 

= Cov((£(io,xo)-^&£(*o,^)), £(*o,a*)) 

n 

= 7(i ,io)-R(zo,#fc) ~ ^ /3 j ~/(t a ,t )R(x j ,x k ). 
3=1 

This, since 7(^0^0) 7^ 0, shows that 

n 

R(x a ,x k )-^2l3 ] R(x : j,x k ) = 0, foralll<fc<n. (28) 

We will show next that £(£o> #o) — S?=i /$j£(*o> x j) i s orthogonal to £(£,, a^) for 
all i = 1, • • • , m and fc = 1, • • • , n. Indeed, 



CovU(t ,a;o) - X^'^* '^' )'£(**' 



»fe. 



i=i 



f(to,ti)R(x a ,x k ) - ^2/3jj(to,tj)R(xj,Xk) 



j=i 



n 

^(to,t»)f-R(a;o,a;fe) - y^ffj-Rfe^fc) ) = 0, 



27 



where the last term vanishes because of ([28]). This implies that £,(to,xo) is in 
fact the orthogonal projection of £(to,%o) onto W and hence, it is the b.l.u.p. 
in terms of the data in V. 



Relation (27) follows by solving ([28]). If E to is invertible, then the solution 
is certainly unique, otherwise the Moore-Penrose generalized inverse E^~ yields 
a particular natural solution. 

Proof: (Proposition [7]) Part (i) is standard in one dimension (see eg Corol- 
lary 5.1.1 in [3]). For completeness, we will prove the result in the case when 
Y (t) £ R d . Let E 00 = Ao^xAl and observe that E(Y (t) - fi )(Y (s) ~ /i )* = 
7x(|t-s|)S o0 . 

Consider now the zero mean Gaussian vectors: £ := Y (to + h) — jjl and 

V = (Y o (t Y -nl • • • , Y o (t - to)' - /**)*. 

Note that £ ~ A/"(0, E 00 ) and T) ~ J\f(0,T m+ i <g> E 00 ), where '(g)' denotes the 
Kronecker product: 



= ( 7 x(|i-j|)S 00 ), sj 

V / (m+l)d> 



Tm+1 <8 E 00 — 

V / (m+l)dx(m+l)d 

and where E OD is a d x d matrix. By assumption, we have that E 00 is invertible, 
and as argued above, so is the Toeplitz matrix r m +i, since Jx{k) — > 0, k — > oo 
(Proposition 5.1.1 in [4 ]). This implies that S^ 1 := (r m+1 <8 S 00 ) _1 = r"^ ® 
E^, 1 exists. Therefore, the conditional distribution £(77 is as follows: 

£|77 - ^(s^E-^r?, E« - E^E-%^ , (29) 

where 

E e „ = ££??' = 7 m+ i(/i)* (8) E 00 , E w = E?to* = r m+i E c 

and 

E,,£ = 7m+lC*) ®E 00 . 

By recalling the definitions of £ and to we obtain that 

E(Y o (t + h)\V)=n o + E(£\n) 

= Mo + (im+iih) 1 <8 E 00 )(r m 1 +1 ® E" 1 )?? 

= Mo + (fm+i W*r;i 1 ) (8> i d )r/, 



^001 



which equals (18), and where in the last relation we used the mixed-product 



property of the Kronecker product. By Relation (29), we also have 



m.s.e.(Y o (t + h)\V) = E c? - E^E^E^ = 7x (0)E o 

-(7 OT+ i(/i)' <8 ^ 00 )(T-^ +1 <g> S-^l+iM (8> E 00 ) 
-7x(0)E oo 
-(7m+i(/i)*r~^ fl 7 m+1 (/i)) (8 E 00 ee <t 2 (/i)E 00 , 



2* 



by the mixed-product property of the Kronecker product and the fact that 
7m+i(^)*r m 1 fi7m+i(^) is a scalar. We have thus shown (19 1. 



We now focus on proving (ii). Consider Y (to + h) and write 

Y u (t + h) := /!„ + C(Y o (t + h) - /i G ) + C(Y o (t + h) - Y o (t + h)). 

As in Proposition |6| one can show that Y u (t + h) — CY o (t + h) is independent 
from Y (t), for all t < t + h. Therefore, 

m.s.c.(Y u (t + h)\V) = m.s.e.(F„(t + h)\Y o (t + h)) 

+Cm.s.c.{Y {tv + h)\V)C\ 

where in the last relation m.s.e.(F u (io + h)\Y (to + h)) stands for the m.s.e. of 



the standard Kriging estimator in Relation ( 16 ) and where m.s.e. (Y (to + h)\T>) 



is as in (19). This completes the proof of (ii). 



To prove (Hi) observe that the estimator in (i) is the conditional expectation 
of Y (to + h) given V and it is therefore the best m.s.e. predictor. If Y(t) is 
non-Gaussian, this yields only the b.l.u.p. By Proposition [6l we have that 

E(V M (£o + h) | {Y {t), t < to + /i}) = M« + C(Y o (t + h) - Mo ), 

on the other hand, by part (i), we have that 

E(/i u + C(Y o (t +h)- fi )\V) = fi u + C(Y o (t + h)- Mo). 

The last two relations yield: ¥,(Y u (to + h)\T>) = \x u + C(Y {% -\-h) — /J, ), which 

shows that Y u (to + h) is the best m.s.e. predictor. In the non-Gaussian case, 
this is merely the b.l.u.p. 
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